元学习考虑了学习高效学习过程的问题,可以利用其过去的经验来准确解决新任务。然而,元学习的效果是至关重要的,这取决于可用于训练的任务的分布,并且通常认为这是已知的先验或由有限的监督数据集构建。在这项工作中,我们的目标是通过考虑从未标记的文本自动提出的自我监督任务来提供元学习的任务分布,以在NLP中启用大规模的元学习。我们通过考虑任务多样性,困难,类型,域和课程的重要方面,并调查它们如何影响元学习表现的重要方面,设计多个自我监督任务分布。我们的分析表明,所有这些因素有意义地改变任务分配,一些突起在荟萃学习模型的下游的下游显着改进。凭经验,20下游任务的结果显示出几次学习的显着改善 - 在以前的无监督的元学习方法增加到+ 4.2%的绝对精度(平均值),并与换行符的监督方法相比表现。
translated by 谷歌翻译
Transformer-based models have gained large popularity and demonstrated promising results in long-term time-series forecasting in recent years. In addition to learning attention in time domain, recent works also explore learning attention in frequency domains (e.g., Fourier domain, wavelet domain), given that seasonal patterns can be better captured in these domains. In this work, we seek to understand the relationships between attention models in different time and frequency domains. Theoretically, we show that attention models in different domains are equivalent under linear conditions (i.e., linear kernel to attention scores). Empirically, we analyze how attention models of different domains show different behaviors through various synthetic experiments with seasonality, trend and noise, with emphasis on the role of softmax operation therein. Both these theoretical and empirical analyses motivate us to propose a new method: TDformer (Trend Decomposition Transformer), that first applies seasonal-trend decomposition, and then additively combines an MLP which predicts the trend component with Fourier attention which predicts the seasonal component to obtain the final prediction. Extensive experiments on benchmark time-series forecasting datasets demonstrate that TDformer achieves state-of-the-art performance against existing attention-based models.
translated by 谷歌翻译
我们旨在通过引入全面的分布式深度学习(DDL)探索器来解决此问题,该研究人员可以确定DDL在公共云上运行时遭受的各种执行“失速”。我们已经通过扩展先前的工作来估算两种类型的通信失速 - 互连和网络摊位来实现剖面。我们使用Profiler培训流行的DNN模型来表征各种AWS GPU实例,并列出了用户做出明智决定的优势和缺点。我们观察到,较昂贵的GPU实例可能不是所有DNN型号的性能最多,并且AWS可能会在次优的硬件互连资源分配次优。具体而言,与单个实例的培训相比,机内互连可以引入高达90%的DNN培训时间和网络连接的实例的通信开销,而与网络连接的实例可能会遭受高达5倍的速度。此外,我们对DNN宏观特征的影响进行建模,例如层的数量和通信摊位上的梯度数量。最后,我们为用户提出了一个基于衡量的建议模型,以降低DDL的公共云货币成本。
translated by 谷歌翻译
translated by 谷歌翻译